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RELATIONS BETWEEN ADJACENCY AND MODULARITY 
GRAPH PARTITIONING* 

HANSI JIANG+t AND CARL MEYERt § 

Abstract. In this paper the exact linear relation between the leading eigenvector of the un¬ 
normalized modularity matrix and the eigenvectors of the adjacency matrix is developed. Based on 
this analysis a method to approximate the leading eigenvector of the modularity matrix is given, 
and the relative error of the approximation is derived. A complete proof of the equivalence between 
normalized modularity clustering and normalized adjacency clustering is also given. A new metric is 
defined to describe the agreement of two clustering methods, and some applications and experiments 
are given to illustrate and corroborate the points that are made in the theoretical development. 
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1. Introduction. The graph partitioning problem is to partition a graph into 
smaller components such that the components will have some specific properties. This 
problem is sometimes also referred to as community structure detection in networks. 
One kind of graph partitioning problem that has gained much scientific interest focuses 
on partitioning the graph into components with similar size and tries to minimize the 
number of edges cut in the process. Examples of applications are given in Section 4. 

There are many algorithms focusing on solving this kind of problem that give prefer¬ 
able graph partitioning results. Among the numerous methods, two clustering tech¬ 
niques that use spectral properties of matrices derived from the adjacency matrices of 
graphs are widely used and researched. Fiedler [6] discovered that a graph’s structure 
is closely related to one of the eigenvectors of the Laplacian matrix of the graph, and 
the eigenvector corresponds to the second smallest eigenvalue. Fiedler suggested in 
[7] to use signs of the entries in the eigenvector to partition a graph. The clustering 
method developed by Fiedler is widely referred to as spectral clustering. The con¬ 
cept of modularity was first introduced by Newman and Girvan in [16], and further 
explained by Newman in [15]. The modularity clustering method aims to partition 
a graph while maximizing the modularity. Like the spectral clustering method sug¬ 
gested by Fiedler, the modularity clustering method also uses signs of entries in the 
eigenvector corresponding to a modularity matrix’s largest eigenvalue. 

There are some modified versions of the spectral clustering and modularity clustering 
methods. Chung [5] analyzes the properties of a scaled version of Laplacian matrices. 
Shi and Malik [20] use the the scaled Laplacian matrices to develop a normalized 
spectral clustering method and use it on image segmentation. Ng et al. [17] discuss 
another version of normalized spectral clustering. In their method a one-side scaled 
Laplacian matrix is used. Bolla [2] analyzes a normalized version of modularity clus¬ 
tering. 

Since modularity matrices are derived from the adjacency matrices of graphs, it is 
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interesting to see if the same or similar clustering results can be obtained from eigen¬ 
values of the adjacency matrices. In this paper relations and comparisons between 
clustering results from using eigenvectors of modularity matrices and adjacency matri¬ 
ces will be given, and the equivalence between using normalized modularity matrices 
and normalized adjacency matrices to cluster will be proved. 

Throughout the paper we assume that G(V , E) is a connected simple graph with 
n = \V\ vertices and m = \E\ edges. Unless specifically noted, A is assumed to be an 
adjacency matrix of a graph, i.e. 


A ■ — 


1 if nodes i and j are adjacent 
0 if otherwise. 


The degree of a vertex is di = a,ji, and D = diag(d\, ^ 2 , • • • ,d n ). The number 

of clusters is always fixed to be 2. If more clusters are needed, the clustering methods 
can be run iteratively to build a hierarchy to get the desired number of clusters. The 
signs of the entries in the eigenvectors will be used to partition the graph. Assume 
there are no zero entries in the eigenvectors used. It should be noted that although 
the adjacency matrices are used in this paper, extending the results to use similarity 
matrices is also possible. The graph Laplacian is defined by 


L = D — A, 


and the modularity matrix is defined by 


B = A - 


dd T 

2m ’ 


where d = (di g ?2 • • • d n ) T is the vector containing the degrees of the nodes. The 

normalized versions of the graph Laplacian and the modularity matrix are 

L sym = D ^LD ^ and B sym = D ^BD“U 


respectively. If e is a column vector with all ones, then it is easy to see that (0, e) is 
an eigenpair of L and B, and (0, D^e) is an eigenpair of L sym and B sym . 


The paper is organized as follows. Section 2 contains the approximation of the lead¬ 
ing eigenvector of the modularity matrix with eigenvectors of the adjacency matrix. 
Section 3 gives the equivalence between normalized adjacency clustering and normal¬ 
ized modularity clustering. Section 4 gives example applications. Conclusions are in 
Section 5. 

2. Dominant Eigenvectors of Modularity and Adjacency Matrices. In 

this section, we will write the eigenvector corresponding to the largest eigenvalue of 
a modularity matrix as a linear combination of the eigenvectors of the corresponding 
adjacency matrix. Before that, we first state a theorem from [3] about the interlacing 
property of a diagonal matrix and its rank-one modification and how to calculate the 
eigenvectors of a diagonal plus rank one (DPR1) matrix [14]. The theorem can also 
be found in [23]. These results will be used in our analysis. 

Theorem 2.1. Let C = D + pw T , where D is diagonal, ||v|| 2 = 1. Let d\ < 
d 2 < • • • < d n be the eigenvalues o/D, and let d\ < g ?2 < • • • < d n be the eigenvalues 
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of C. Then d\ < d\ < d 2 < d 2 < • • • < d n < d n if p < 0. If the di are distinct and all 
the elements of v are nonzero, then the eigenvalues of C strictly separate those o/D. 

Corollary 2.2. With the notations in Theorem 2.1, the eigenvector of C cor¬ 
responding to the eigenvalue di is given by (D — dil) _1 v. 

Theorem 2.1 tells us the eigenvalues of a DPR1 matrix are interlaced with the 
eigenvalues of the original diagonal matrix. Next we will write the eigenvector corre¬ 
sponding to the largest eigenvalue of a modularity matrix as a linear combination of 
the eigenvectors of the corresponding adjacency matrix. 

With the notations in Section 1, since A is an adjacency matrix, it is symmetric 
and therefore orthogonally similar to a diagonal matrix. Therefore, there exists or¬ 
thogonal matrix U and diagonal matrix Sa such that 

A = U£ a U t . 

Suppose the rows and columns of A are ordered such that £ A = diag(oi,<j 2 , • • • , cr n ), 
where ay > ct 2 > • • • > a n . Let U = (uj u 2 u n ). Similarly, since B is 

symmetric, it is orthogonally similar to a diagonal matrix. Suppose the eigenvalues 
of B are 81 , 82, • • • , 8 n with 81 > 82 > ■ > /3„. 

Theorem 2.3. Suppose 81 7 ^ ay, 81 7 ^ cr 2 , and \ 8 i — o 2 \ = A. Then the 
eigenvector corresponding to the largest eigenvalue of B is given by 

ll UTd H 2 CTi-((72+A) 

Proof. Since B = A dd T /(2m), we have 

B = A ^ = U£ a U t - ^ = U(£ a + pyy T )U T , 

Am Am 

where y = U T d/||U T d || 2 and p = —1|U 7 d|||/(2 to). Since S A + pyy T is also sym¬ 
metric, it is orthogonally similar to a diagonal matrix. So we have 

B = UVS b V t U t , 

where V is orthogonal and S B is diagonal. Since S A + pyy T is a DPR1 matrix, p < 0 
and ||y ||2 = 1, the interlacing theorem applies to the eigenvalues of A and B. More 
specifically, we have 


8 n<cr n < 8n-l < On-1 < ■ ■ ■ < /3 2 < CT 2 < 8 l < Ol ■ 

The strict inequalities hold because 81 7 ^ ay and 81 7 ^ 02 - Then \ 8 i — ct 2 | = A implies 
81 — 02 = A. Let Bi = £ a + pyy T . Since B = UBiU t , we have BU = UBi. 
Suppose (A, v) is an eigenpair of Bi, then 

BUv = UBiv = AUv 

implies that (A, v) is an eigenpair of Bi if and only if (A, Uv) is an eigenpair of B. 
By Corollary 2.2, the eigenvector of Bi corresponding to 81 is given by 

V! = (£ a - 81 1) _1 y = (Sa - (02 + A )!)- 1 


||U^d|| 2 ’ 
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and hence the eigenvector of B corresponding to /?1 is given by 


bi = Uv! = U(S A - (fi 2 + A)I)' 


U J d 

ww 


l|U T d|| 2 S a 


n t 

S rr • — (fTci - 


(cr 2 + A) 


□ 

The point of Theorem 2.3 is to realize that the vector bi is a linear combination of 
the u;. Let 


ufd 

7i_ (<7i-/3i)||U r d|| 2 - 

The purpose of the next theorem is to approximate bi by a linear combination of 
that have the largest |yi| and examine how good the approximation is by calculating 
the norm between bi and its approximation. 

Theorem 2.4. With the notations and assumptions in Theorem 2.3 , let 

ujd 

ll ~ (o-i - / S 1 )||U r d|| 2 ' 

Suppose that ik £ {1,2, ,n}, and the 7j are reordered such that |7i n | < 1 < 

• ■ • < | 7 jj |. Then given p £ {1,2, ■ • • , n}, bi can be approximated by 

v 

3=1 


with relative error 


e re i - ^, 


\ S 

\ ]=p +1 


tL 


where q is the 2-norm of the vector bi. 
Proof. Since 


ufd 


7 % = 


(°i — A)||U T d|| 2 ' 


the vector bi can be written as 


bi =Y,^ Ui r 

*=1 i=i 


So if 

p 

i=i 

is an approximation of bi, then the difference between bi and its approximation v is 

n 

bi-v= Y, TifUi Jt 

3=P +1 
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and the 2-norm of t>! — v is 


n 


n 

E % u b 

\ 

E 

j=p +1 

2 \ 

j=p+l 


because the U; are orthonormal. So if q is the 2-norm of the vector bi, then the 
relative error of the approximation is 


Orel 


ll b ! - v lb 

l|bi|| 


; \ 


Y. 7 1 

j=p + 1 


□ 

The utility of this error helps us gauge the number of terms that are required to 
obtain a given level of accuracy when approximating the dominant eigenvector of the 
modularity matrix with eigenvectors of the adjacency matrix. 

3. Normalized Adjacency and Modularity Clustering. Parallel to the pre¬ 
vious analysis, we will prove that the eigenvectors corresponding to the largest eigen¬ 
values of a normalized adjacency matrix and a normalized modularity matrix will 
give the same clustering results in nontrivial cases. A similar statement is mentioned 
in [2] without a complete proof, and it is considered in [24] from a different perspective. 

Suppose A is an adjacency matrix and A syrn = D _ 1 / 2 AD -1 ^ 2 is the corresponding 
normalized adjacency matrix. Let L = D A be the unnormalized Laplacian matrix 
and L sym = D _ 1 / 2 LD” -1 / 2 = I — A sym be the normalized Laplacian matrix. Finally 
let B be the unnormalized modularity matrix defined in Section 1, P = dd r /(2m), 
and B sym = D _1 / 2 BD^ 1/ ' 2 be the normalized modularity matrix. We first state the 
theorem then prove it. 

Theorem 3.1. Suppose that zero is a simple eigenvalue of B sym , and one is a 
simple eigenvalue of A sym . If A 7^ 0 and A ^ 1 , then (A,u) is an eigenpair of A sym 
if and only if (A, u) is an eigenpair of B sym . 

The proof of the theorem is obtained by combining the following two observations. 
The second observation needs more lines to explain so we write it as a lemma. 

Observation 3.2. (A,u) is an eigenpair of "Lsym if and only if (1 — A,u) is an 
eigenpair of A sym because 


Lsyrnll — Au 


- (I - A sym )u = Au 
A sym u = (1 - A)u. 


Lemma 3.3. Suppose that 0 is a simple eigenvalue of both L syrn and B sym . It 
follows that if X ^ 0 and (A,u) is an eigenpair ofh sym , then (1 —A,u) is an eigenpair 
offisym- If ol 7 ^ 0 and (a, v) is an eigenpair of B sym , then (1 — a, v) is an eigenpair 

of L sym ■ 
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Proof. For P = dd T /(2m), it is easy to observe that 


B, 


L sym = D~2 (A - P + D A)D 2 =l-D"5PD-5. 


Let E = D^PD-i. If (A,u) is an eigenpair of L sym , we have 


All - L S y m U 


Au = (I - B 


sym 


E)u 


=> (1 - A)u = B sym u + Eu. 

Note that P is an outer product and P ^ 0, so rank(P)=l. Because E = 

1 i 

D 5PD 2 is congruent to P, E and P have the same number of positive, negative 
and zero eigenvalues by Sylvester’s law [14]. Therefore rank(E)=rank(P)=l. To prove 
Eu = 0, it is sufficient to prove u is in the nullspace of E. 

Let e be the vector such that all its entries are one. Observe that 

E D^e = D“^PD = D _ z e 

2 m 


d T e 

2m 


(D~zd) = D zd, 


since d T e = di ~ ^ rn just the sum of the degrees of all the nodes in the graph. 

Because D zd = D^e, (1, D^e) is an eigenpair of E. Also observe that 

L S ym D^e — D z(D — A)D = D ^Le = 0. 

Therefore, (0,D5e) is an eigenpair of L sym . Since u is an eigenvector of L sym corre¬ 
sponding to a nonzero eigenvalue A, we have u T D^e, so u is in the nullspace of E. 
This gives Eu = 0 and thus (1 — A)u = B sym u. Therefore Au = L sym u => (1 — A)u = 

B S y m U. 


On the other hand, if (a, v) is an eigenpair of B sym , then we have 

av = B sym v 


(I Lsym E)V 


L S ymY ' 


Ev 


(1 — a)v. 


Observe that 

B sym ■ D^e = D zBD~zD5e = D zge = 0 

because the row sums of B are all zeros. Therefore, (0, D^e) is an eigenpair of B sym . 
Since v is an eigenvector of B sym corresponding to a nonzero eigenvalue a , we have 
v _!_ Die, so v is in the nullspace of E. This gives Ev = 0 and thus (1 — a)v = L sym v. 




RELATIONS BETWEEN ADJACENCY AND MODULARITY GRAPH PARTITIONING 7 


Therefore av = B sym v =>• (1 — a)v = L sy7n v. □ 

By theorem 3.1, a bijection from the nonzero eigenvalues of B sym to the eigenval¬ 
ues of A gym that are not equal to one can be established, and the order of these 
eigenvalues is maintained. Since zero is always an eigenvalue of B sym , the largest 
eigenvalue of B sjm is always nonnegative. Newman [15] gives a discussion of when 
the largest eigenvalue of B can be zero. Since B sym and B are congruent, it follows 
that if zero is the largest eigenvalue of B, then it is also the largest eigenvalue of B sym . 
In this case, all nodes in the graph will be put into one cluster because (0, D^e) is an 
eigenpair of B sym and all entries in the vector D?e are larger than zero. The following 
theorem establishes that the eigenvectors corresponding to the largest eigenvalues of 
a normalized adjacency matrix and a normalized modularity matrix are the same for 
nontrivial cases (i.e. when the largest eigenvalue of B is not zero), and therefore they 
will provide the same clustering results in nontrivial cases. 

Theorem 3.4. With the assumptions in Theorem 3.1, and given zero is not the 
largest eigenvalue of B sym , the eigenvector corresponding to the largest eigenvalue of 
B syrn and the eigenvector corresponding to the second largest eigenvalue of A sym are 
identical. 

Proof. Since L sy m is positive semi-definite [22], zero is the smallest eigenvalue 
of L S ym. Then by Observation 3.2, one is the largest eigenvalue of A sym . Since all 
eigenvalues of A sym that are not equal to one are also the eigenvalues of B sym , it 
follows that if the simple zero eigenvalue is not the largest eigenvalue of B sym , then 
the largest eigenvalue of A sym is the second largest eigenvalue of B sym and they have 
the same eigenvectors by Theorem 3.1. □ 

4. Some Applications and Experiments. To corroborate the theoretical re¬ 
sults obtained in the previous sections, experiments were conducted with three well- 
known data sets. In the following experiments the effects of the units are first elim¬ 
inated by normalizing each variable by the 2-norm if necessary, then the Gaussian 
similarity function is applied to the data to generate a similarity matrix S. The pa¬ 
rameters in the similarity function used for different data sets are different, and will 
be specified individually. The mean, s, of all off-diagonal entries in S is computed 
and the adjacency matrix is formed by 


. _ J 1 if i ^ j and S. tJ > s 

1:1 | 0 if otherwise. 

4.1. Data Sets. We used three popular data sets from the literature, and they 
are described below. 

4.1.1. Wine Recognition Data Set. The wine recognition data from the UCI 
data repository [12] is one of the most famous data sets used in data mining [8] [10] [18]. 
The data set is a result of chemical analysis of wines growing in the same region. The 
difference between the wines is that they are derived from three different cultivars. 
The data contains 178 wine samples, the labels of the samples that tell which kind 
of wine each sample is and 13 variables from chemical analysis. In the experiments 
the data of the first two kinds of wines are used to generate a similarity matrix. A 
good clustering method should be able to put samples from the same classes into the 
same clusters. To build the similarity matrix, the Gaussian similarity function with 
er = 0.1 is used. 
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4.1.2. Breast Cancer Wisconsin (Original) Data Set. The Breast Cancer 
Wisconsin Data Set [13] is a widely used data in classification and clustering [1] and 
it can be downloaded from the UCI data repository. The data contains 699 instances 
and 9 attributes. The attributes are measurements of the sample tissues. Each data 
has a label to indicate whether the tissue is benign or malignant. The data contains 
16 instances that has missing values in the attributes, and the missing values are 
replaced by zeros in the experiments. To build the similarity matrix, the Gaussian 
similarity function with a = 0.1 is used. 

4.1.3. PenDigit Data Sets from MNIST database. The PenDigit data sets 
are subsets of the widely used MNIST database [11] [25] [9] [4] [19]. The original data 
contains a training set of 60,000 handwritten digits from 44 writers. One subset used 
in the experiments contains some of the digits 1 and 7, and the other subset contains 
some of the digits 2 and 3 1 . Each piece of data is a row vector converted from a grey¬ 
scale image. Each image is 28 pixels in height and 28 pixels in width, so there are 784 
pixels in total. Each row vector contains the label of the digit and the lightness of each 
pixel. Lightness of a pixel is represented by a number from 0 to 255 inclusively, and 
smaller numbers represent lighter pixels. To build the similarity matrix, the Gaussian 
similarity function with <j = 1 , 000 is used. 

4.2. Clustering Synchronization Rate. In classification methods the perfor¬ 
mance is gauged by using metrics such as accuracy and error rate. More specifically, 
the accuracy is defined as the number of correct classifications over the total number 
of classifications, and the error rate is defined as the number of wrong classifications 
over the total number of classifications [21]. Similar metics can be used to evaluate 
how close two results from two different clustering methods can be. Suppose two 
clustering methods M\ and M 2 are used to partition the same data into two parts. 
Define the clustering synchronization rate (CSR) between M\ and M 2 to be 

(4.1) 

CSR(M„ = max( numlDcr ° f d » ta P* th ° a™ clusters by M. and M, x 

total number of data 

In other words this is the percentage of agreement between Mi and M 2 . Because 
different clustering method may give different labels to a cluster, the max function is 
used in the definition. If M 2 is the known “ground” truth of the clusters, then the 
CSR between Mi and M 2 is as the same as the accuracy of M\. It should be noted 
that the CSR is not relevant to the accuracy of the clustering methods unless one of 
them is the ground truth. 

4.3. Results. The experimental results are in the tables below. Table 1 contains 
the number of data points in each data set and the accuracy of each clustering method 
when applied on the data sets. The symbols L, B and L sym in the table represent the 
unnormalized spectral clustering, unnormalized modularity clustering and normalized 
spectral clustering, respectively. These clustering results are used as the benchmarks. 
Note that by Theorem 3.4, the clustering results from using L sym and B sym are the 
same. The columns A p use the approximations described in Theorem 2.4 with the 
first p eigenvectors of the adjacency matrix to do clustering. In Table 2, the CSRs 
between the leading eigenvector of B and its approximations are computed, and the 
largest four magnitudes of 74 . described in Theorem 2.4 are listed. 


1 The data can be downloaded at http://www.kaggle.eom/c/digit-recognizer/data 
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Data 

Number of data points 

L 

B 

^sym 

A! 

^2 

a 3 

Wine 

130 

56.2 

91.2 

92.3 

91.2 

91.2 

91.2 

Breast Cancer 

699 

70.0 

96.6 

96.3 

87.0 

96.7 

96.7 

PenDigitl7 

9085 

51.8 

96.9 

96.3 

82.9 

96.5 

97.0 

PenDigit23 

8528 

51.2 

90.1 

88.2 

89.2 

90.2 

90.3 


Table 1: This table records the accuracy of the clustering methods on the data sets 
where the accuracy is the number of correct classifications over the total number of 
classifications. All numbers are percentages(%) except for the number of instances 
column. 


Data 

CSR(Ai,B) 

CSR(A 2 ,B) 

CSR(A 3 ,B) 

l7*il 

|TZ2 

17*3 1 

17*4 1 

Wine 

100.0 

100.0 

100.0 

1344.7 

21.1 

1.2 

0.53 

Breast Cancer 

88.4 

99.9 

99.9 

59.5 

31.2 

2.1 

0.45 

PenDigitl7 

82.8 

99.0 

99.6 

265.2 

146.3 

42.6 

7.1 

PenDigit23 

94.2 

99.6 

99.6 

653.0 

150.1 

17.2 

5.1 


Table 2: This table records the CSRs as defined in (4.1) between the leading eigen¬ 
vector of B and its first three approximations, and the four largest magnitudes of 7 . 
The CSRs are percentages(%). 

From Table 1, it can be seen that the approximations of the leading eigenvector 
of B can outperform the unnormalized spectral clustering method for all data sets 
considered, and the accuracy is about the same with the unnormalized modularity 
method and the normalized spectral clustering method. In some cases, the clustering 
results from the approximations are better than the benchmarks. From Table 2, it can 
be seen that the CSRs between the leading eigenvector of B and its approximations 
are higher than 80%. If two or three eigenvectors of A are used, then the CSRs are 
higher than 99%. 

5. Conclusion. In this paper the exact linear relation between the leading eigen¬ 
vector of the unnormalized modularity matrix and the eigenvectors of the adjacency 
matrix is developed. It is proven that the leading eigenvector of the modularity matrix 
can be written as a linear combination of the eigenvectors of the adjacency matrix, 
and the coefficients in the linear combination are determined. Then a method to 
approximate the leading eigenvector of the modularity matrix is given, and the rel¬ 
ative error of the approximation is derived. It is also proven that when the largest 
eigenvalue of the modularity matrix is nonzero, the normalized modularity cluster¬ 
ing method will give the same clustering results as obtained by using the eigenvector 
corresponding to the smallest eigenvalue of the normalized adjacency matrix. A new 
metric, the clustering synchronizing rate, is defined to compare different clustering 
methods. Some applications and experiments are given to illustrate and corroborate 
the points that are made in the theoretical development. 
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